Biostatistics For Dummies (Monika Wahi John Pezzullo)

part of the graph.

© John Wiley & Sons, Inc.

FIGURE 17-3: Diagnostic graphs from a regression.

Determining how well the model fits the data

Several calculations in standard regression output indicate how closely the model fits your data:

The residual SE is the average scatter of the observed points from the fitted model. You want them

to be close to the line. As shown in Figure 17-2, the residual SE is about

mmHg.

The multiple r² value represents the amount of variability in the dependent variable explained by

the model, so you want it to be high. As shown in Figure 17-2, it is 0.52 in this example, indicating

a moderately good fit.

A statistically significant F statistic indicates that the model predicts the outcome significantly

better than the null model. As shown in Figure 17-2, the p value on the F statistic is 0.009, which is

statistically significant at α = 0.05.

Figure 17-4 shows another way to judge how well the model predicts the outcome. It’s a graph of

observed and predicted values of the outcome variable, with a superimposed identity line (

). Your program may offer this observed versus predicted graph, or you can

generate it from the observed and predicted values of the dependent variable. For a perfect prediction

model, the points would lie exactly on the identity line. The correlation coefficient of these points is

the multiple r value for the regression.